Shanghai AI Lab, in collaboration with the Corpus Data Alliance, has open-sourced the 'Shusheng・Wanjuan' 1.0 multi-modal pre-trained dataset, which includes text, images, and video datasets, totaling over 2TB. The dataset has undergone fine-grained cleaning and deduplication, featuring multi-dimensional integration, meticulous processing, and ease of use. The release of this open-source dataset will help promote the application and innovation of large models and lower the technical barriers associated with large model technologies.